lifelong reinforcement learning
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework.
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well--mitigating loss of plasticity and rapidly adapting to challenging distribution shifts--despite the underlying optimization problem being nonconvex and nonstationary.
Reviews: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Even after the discussion and the author response there was still some disagreement between the reviewers. The paper proposes a simple yet novel and very interesting idea. There still are a few concerns about clarity, but those can be fixed in the final version (see updated reviews). Overall this is a solid paper, that (as always) would benefit from more thorough empirical evaluation. One reviewer proposed to add an additional baseline of a domain-randomized robust policy that is trained on various tasks.
Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory
Zhang, Zhi, Chow, Chris, Zhang, Yasi, Sun, Yanchao, Zhang, Haochen, Jiang, Eric Hanchen, Liu, Han, Huang, Furong, Cui, Yuchen, Padilla, Oscar Hernan Madrid
Lifelong reinforcement learning (RL) has been developed as a paradigm for extending single-task RL to more realistic, dynamic settings. In lifelong RL, the "life" of an RL agent is modeled as a stream of tasks drawn from a task distribution. We propose EPIC (\underline{E}mpirical \underline{P}AC-Bayes that \underline{I}mproves \underline{C}ontinuously), a novel algorithm designed for lifelong RL using PAC-Bayes theory. EPIC learns a shared policy distribution, referred to as the \textit{world policy}, which enables rapid adaptation to new tasks while retaining valuable knowledge from previous experiences. Our theoretical analysis establishes a relationship between the algorithm's generalization performance and the number of prior tasks preserved in memory. We also derive the sample complexity of EPIC in terms of RL regret. Extensive experiments on a variety of environments demonstrate that EPIC significantly outperforms existing methods in lifelong RL, offering both theoretical guarantees and practical efficacy through the use of the world policy.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework.
CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning
Birkbeck, John, Sobey, Adam, Cerutti, Federico, Flynn, Katherine Heseltine Hurley, Norman, Timothy J.
Reinforcement learning agents can achieve superhuman performance in static tasks but are costly to train and fragile to task changes. This limits their deployment in real-world scenarios where training experience is expensive or the context changes through factors like sensor degradation, environmental processes or changing mission priorities. Lifelong reinforcement learning aims to improve sample efficiency and adaptability by studying how agents perform in evolving problems. The difficulty that these changes pose to an agent is rarely measured directly, however. Agent performances can be compared across a change, but this is often prohibitively expensive. We propose Change-Induced Regret Proxy (CHIRP) metrics, a class of metrics for approximating a change's difficulty while avoiding the high costs of using trained agents. A relationship between a CHIRP metric and agent performance is identified in two environments, a simple grid world and MetaWorld's suite of robotic arm tasks. We demonstrate two uses for these metrics: for learning, an agent that clusters MDPs based on a CHIRP metric achieves $17\%$ higher average returns than three existing agents in a sequence of MetaWorld tasks. We also show how a CHIRP can be calibrated to compare the difficulty of changes across distinctly different environments.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
Interview with Safa Alver: Scalable and robust planning in lifelong reinforcement learning
In their paper Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning, Safa Alver and Doina Precup introduced special kinds of models that allow for performing scalable and robust planning in lifelong reinforcement learning scenarios. In this interview, Safa Alver tells us more about this work. It has long been argued that in order for reinforcement learning (RL) agents to perform well in lifelong RL (LRL) scenarios (which are scenarios like the ones we, biological agents, encounter in real life), they should be able to learn a model of their environment, which allows for advanced computational abilities such as counterfactual reasoning and fast re-planning. Even though this is a widely accepted view in the community, the question of what kinds of models would be better suited for performing LRL still remains unanswered. As LRL scenarios involve large environments with lots of irrelevant aspects and environments with unexpected distribution shifts, directly applying the ideas developed in the classical model-based RL literature to these scenarios is likely to lead to catastrophic results in building scalable and robust lifelong learning agents.
Towards Continual Reinforcement Learning: A Review and Perspectives
Khetarpal, Khimya | Riemer, Matthew (a:1:{s:5:"en_US";s:42:"IBM Research, Mila, University of Montreal";}) | Rish, Irina | Precup, Doina
In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
- Overview (1.00)
- Research Report (0.92)
- Instructional Material > Course Syllabus & Notes (0.45)
- Leisure & Entertainment > Games (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education (1.00)
Lifelong Machine Learning of Functionally Compositional Structures
A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different problems. Learning such compositional structures has been a challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to combine existing components to assimilate a novel problem, and learning how to adapt the existing components to accommodate the new problem. This separation explicitly handles the trade-off between stability and flexibility. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Supervised learning evaluations found that 1) compositional models improve lifelong learning of diverse tasks, 2) the multi-stage process permits lifelong learning of compositional knowledge, and 3) the components learned by the framework represent self-contained and reusable functions. Similar RL evaluations demonstrated that 1) algorithms under the framework accelerate the discovery of high-performing policies, and 2) these algorithms retain or improve performance on previously learned tasks. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the task distribution varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment.
- North America > United States (0.45)
- Europe > United Kingdom (0.27)
- Workflow (1.00)
- Summary/Review (1.00)
- Research Report > New Finding (1.00)
- (2 more...)
- Materials > Chemicals > Industrial Gases > Liquified Gas (0.68)
- Education > Educational Setting > Online (0.67)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- (2 more...)